Technique for producing through watermarking highly tamper-resistant executable code and resulting “watermarked” code so formed

ABSTRACT

Apparatus and an accompanying method, for forming and embedding a highly tamper-resistant cryptographic identifier, i.e., a watermark, within non-marked executable code, e.g., an application program, to generate a “watermarked” version of that code. Specifically, the watermark, containing, e.g., a relatively large number of separate executable routines, is tightly integrated into a flow pattern of non-marked executable code, e.g., an application program, through randomly establishing additional control flows in the executable code and inserting a selected one of the routines along each such flow. Since the flow pattern of the watermark is highly intertwined with the flow pattern of the non-marked code, the watermark is effectively impossible to either remove from the code and/or circumvent. The routines are added in such a manner that the flow pattern of resulting watermarked code is not substantially different from that of the non-marked code, thus frustrating third party detection of the watermark using, e.g., standard flow analysis tools. To enhance tamper-resistance of the watermarked code, each such routine can provide a pre-defined function such that if that routine were to be removed from the marked code by, e.g., a third party adversary, then the marked code will prematurely terminate its execution.

RELATED APPLICATIONS

This application is a continuation of and claims priority to U.S. patentapplication Ser. No. 09/525,694, filed Mar. 14, 2000 now U.S. Pat. No.6,829,710, the disclosure of which is incorporated by reference herein.

BACKGROUND OF THE DISCLOSURE

1. Field of the Invention

The invention relates to a technique including both apparatus and anaccompanying method, for forming and embedding a hidden highlytamper-resistant cryptographic identifier, i.e., a watermark, withinnon-marked computer executable code, e.g., an application program, togenerate a “watermarked” version of that code. This technique can alsobe used to tightly integrate, in a highly tamper-resistant manner, otherpre-defined executable code, such as security code, as part of thewatermark, into the non-marked code in order to form the watermarkedcode.

2. Description of the Prior Art

Over the past decade or so, personal computers (PCs) have become ratherubiquitous with PC hardware and software sales experiencing significantgrowth. However, coincident with an ever widening market for PCs,unauthorized copying of PC software, whether it be application programsor operating systems, continues to expand to rather significantproportions. Given that in certain countries sales lost to such copyingcan significantly exceed legitimate sales, over the years softwaremanufacturers have attempted to drastically reduce the incidence ofunauthorized copying though, practically speaking, with only limitedsuccess.

One such technique, probably one of the oldest techniques used andusually rather ineffective, is simply to append a copyright and otherlegal proprietary rights notices to object code as distributed on mass(magnetic or optical) media. The intention in doing so is to deterunauthorized copying by simply placing a third party on notice that acopy of the program, embodied by that code, is legally protected andthat its owner may take legal action to enforce its rights in theprogram against that party to prevent such copying. These notices can bereadily discovered in program code listings and simply excised by thethird party prior to copying and distributing illicit copies. Other suchnotices can be excised by a third party adversary from the softwaremedia itself and the program packaging as well. Though these notices areoften necessary in many jurisdictions to secure full legal remediesagainst third parties, in practice, these notices have provided little,if any, real protection against third party copying.

Another technique that is recently seeing increasing use is to require aPC, on which the program is to execute, to hold a valid digital“certificate” provided by the manufacturer of the program. Thecertificate will typically be loaded as a separate step duringmanufacture of the PC. During initialization, the program will test thecertificate and confirm its authenticity and validity. If thecertificate is authentic and valid, the program will continue toexecute; otherwise, the program will simply terminate. Unfortunately,the certificate and associated testing routines are often very looselybound to the remainder of the program code. Currently available softwareanalysis tools can display execution flow among program instructions ina program under test. Consequently, with such tools, a programmer, withknowledge of an operational sequence implemented by the program and byanalyzing a flow pattern inherent in that program, as it executes, canreadily discern the program code that implements a certificate testingfunction. Once this code is detected, the programmer can readily excisethat portion from the program itself and simply modify the remainingprogram code, by, e.g., by inclusion of appropriate jump instruction(s),to compensate for the excised portion; thus, totally frustrating theprotection which the certificate was intended to provide againstunauthorized copying. Once having done so, a third party adversary canthen produce and distribute unauthorized, but fully executable, copiesof the program free of all such protection. Thus, in practice, thisapproach has proven to be easily compromised and hence afforded verylittle, if any, real protection against illicit copying.

Other techniques have relied on using serialized hardware or otherhardware centric arrangements to limit access to a program to one ormore users at one particular PC and preclude that program from beingloaded onto another PC. Generally, these techniques, often referred toas “copy protect” schemes and which were popular several years ago,relied on inserting a writeable program distribution diskette, such as afloppy diskette, into a PC and then, during execution of an installationprocess from that diskette, have that PC store machine specific data,such as a hardware identification code, onto the diskette. Thereafter,during each subsequent installation of the program, an installationprocess would check the stored machine specific data on the installationdiskette against that for a specific PC on which the program was thenbeing installed. If the two pieces of data matched, installation wouldproceed; otherwise, it would prematurely terminate. Unfortunately, suchschemes, while generally effective against unauthorized copying, oftenprecluded legitimate archival copying as well as a legitimateinstallation of the program on a different PC. In view of substantialinconveniences imposed on the user community, such “copy protect”schemes quickly fell into disuse and hence where basically abandonedshortly after they first saw widespread use. Moreover, any suchtechnique that relies on storing information on the distribution mediaitself during program installation is no longer practical when todaysoftware is distributed on massive read-only optical media, such asCDROM or, soon, digital video disk (DVD).

Therefore, given the drawbacks associated with copy protect andcertificate based schemes, one would think that embedding an identifierof some sort into a program, during its manufacture and/or installationand subsequently testing for that identifier during subsequent executionof an installed version of that program at a user PC, would holdpromise.

However, for such an identifier based approach to be feasible, a needexists in the art for an identifier, such as a watermark, that can betightly integrated into a program itself such that the watermark wouldbe extremely difficult, if not effectively impossible, for a third partyto discern, such as through flow analysis, and then circumvent, such asby removal.

In particular, such a watermark could be embedded in some fashion into anon-marked program. Then, subsequently, at runtime of an installedversion of that program at a user PC, a “secret” key(s) basedcryptographic process could be used to reveal the presence of and testthe watermark. The key(s) would be separately stored down, to the PC, asa software value(s). If the correct watermark were then detected,execution of the installed program would continue; else, execution wouldhalt. Fortunately, such an approach would likely impose essentially noburden on, and preferably be totally transparent to, the user, and notfrustrate legitimate copying.

If such an identifier could be made sufficiently impervious to thirdparty detection and tampering, then advantageously its use, with, forexample, such an approach, may well prove effective, in practice, atreducing unauthorized third party copying.

SUMMARY OF THE INVENTION

Our present invention advantageously satisfies this need and overcomesthe deficiencies in the art through a watermark, containing, e.g., arelatively large number of executable routines, that is tightlyintegrated into a flow pattern of non-marked executable code, e.g., anapplication program, through randomly establishing additional control(execution) flows in the executable code and inserting a selected one ofthe routines along each such flow. Since a resulting flow pattern of thewatermark is highly intertwined with (tightly spliced into) the flowpattern of the non-marked code, the watermark is effectively impossibleto either remove from the code and/or circumvent. Furthermore, the codefor the routines themselves is added in such a manner that the flowpattern of resulting “watermarked” code is not substantially differentfrom that of the non-marked code. Hence, the watermark is also extremelydifficult for a third party adversary to discern using, e.g., standardflow analysis tools and human inspection.

Advantageously, to enhance tamper-resistance of the watermarked code,each routine, that constitutes a portion of the watermark, can provide apre-defined function such that, if that routine were to be removed fromthe marked code by, e.g., the third party adversary, then the markedcode will prematurely terminate its execution.

In accordance with our specific inventive teachings, unmarked executablecode which forms, e.g., an application program that is to be watermarkedis first converted, using a conventional software flow analysis tool,into its corresponding flow graph. Predefined security code, typicallyconstituting specific predetermined executable software code, is alsoconverted, through use of the same tool, into its corresponding flowgraph. The security code can itself constitute, for example: specific“watermark” code, i.e., executable code having as its primary, if notsole, purpose to form a portion of a watermark (i.e., distinct from theapplication program itself); a complete image of the entire applicationprogram itself; or just a portion of that program. In that regard, theunmarked executable code and the security code can each be formed of adifferent half of a common application program.

Thereafter, each of the flow graphs is k-partitioned to yield clusterflow graphs G′ and H′, respectively (where k is a pre-defined integer,such as illustratively 1000 for a large application program then beingwatermarked), each having k clusters of nodes (each being a partition).M edges (links) (where M is typically a large pre-defined integer, suchas illustratively 500,000 for that application program) are collectivelyinserted between corresponding pairs of randomly selected nodes in: (a)graphs G′ and H′; and, where desired, (b) different clusters solely ingraph G′, and/or (c) different clusters solely in graph H′. For eachedge, a routine is selected from a pre-defined library, based on, e.g.,minimizing adverse affects on program flow, and its designation isinserted along that edge in the flow graph(s), specifically at one ofthe nodes associated with that edge. All the edges collectively andeffectively splice clustered flow graphs G′ and H′ together into asingle combined flow graph. Executable code is then produced whichcorresponds to that depicted in the single combined flow graph. Thewatermark is collectively defined by the routines and edges that havebeen inserted into the unmarked code.

One illustrative heuristic for selecting each specific pair of nodes,in, e.g., cluster graphs G′ and H′ that are to be joined by an edge, isas follows. First, randomly pick a node, U, in graph G′. With λ beingpre-defined as equaling (a number of edges that are to transit betweenG′ and H′)/(a number of edges connected to U), then, with a probabilityof 1-λ, randomly choose a node, Y, in graph H′. Then, with a probabilityof λ, randomly choose a node, Z other than U, in graph G′. Finally,provide, as output, designations of nodes Y and Z as a nodal pair.

During subsequent edge insertion, connect the nodes for that edgetogether, e.g., nodes Y and Z, so as to insert an edge extending betweencluster flow graphs G′ and H′. Based on proper program flow, insert anappropriate routine from the library along that edge and at anappropriate node, in a graph, for that pair. Repeat these node selectionand insertion steps until all M edges and designations for associatedroutines are collectively added to cluster graphs G′ and H′ so as tofully splice both graphs and the associated routines into a singlecombined flow graph. Parameters k, M and λ are preferably kept insecret.

Each of these routines is predetermined, usually quite compact, requiresrelatively little execution time and executes a pre-defined, oftenself-contained operation, such as, e.g., computing a cryptographic keyfor use in printing or decoding a variable, or decrypting a cipheredvariable. Each of the operations is chosen so as not to require muchprocessing time; thus, not noticeably degrading execution of thewatermarked program. Collectively, the routines that are inserted aresuch that, for proper execution of the watermarked program, they mustall be executed and, to a certain extent, in a given sequence. In thatregard, if any one or more of these routines is removed from thewatermarked program, such as by a third party adversary, that programwill gracefully terminate its execution.

To further frustrate its detection, the code for all the insertedroutines is collectively scattered approximately uniformly throughoutthe “watermarked” program as that program is being constructed from itscombined flow graph. In this manner, the routines will not becentralized in any one portion of the watermarked program. Furthermore,each of these routines is written with standard code “obfuscation”techniques to further camouflage their functionality.

Advantageously, as a feature, the present invention can securelywatermark any executable code, whether it forms, e.g., an applicationprogram, an operating system (O/S) or a software module.

BRIEF DESCRIPTION OF THE DRAWINGS

The teachings of the present invention can be readily understood byconsidering the following detailed description in conjunction with theaccompanying drawings, in which:

FIG. 1 depicts a simplified high-level block diagram of conventionalwatermarking process 100;

FIG. 2 depicts a simplified and illustrative flow graph produced throughconventional program flow analysis;

FIG. 3 depicts a simplified high-level block diagram of our inventivetechnique for securely watermarking software;

FIG. 4 depicts a detailed block diagram of our inventive technique shownin FIG. 3;

FIG. 5 depicts table 500 of illustrative portions of a code image thatcan be applied as separate inputs to our present invention forwatermarking that entire code image;

FIG. 6 depicts illustrative edge and associated routine insertion incluster flow graphs G′ and H′ that can arise from use of our presentinventive technique, and resulting execution (control) flow amongvarious routines, one of which having been inserted;

FIG. 7 depicts a high-level block diagram of computer system 700,illustratively a personal computer (PC), that can be used to implementour present invention;

FIG. 8 depicts a high-level flowchart of watermarked code generationprocedure 800 that is executed by computer system 700 to implement ourpresent invention;

FIG. 9 depicts correct alignment of the drawing sheets for FIGS. 9A and9B;

FIGS. 9A and 9B collectively depict a high-level flowchart of edgeinsertion procedure 900 that is executed as part of procedure 800 shownin FIG. 8; and

FIG. 10 depicts a flowchart of node selection procedure 1000 which isexecuted as part of procedure 900 shown in FIG. 9.

To facilitate understanding, identical reference numerals have beenused, where possible, to designate identical elements that are common tothe figures.

DETAILED DESCRIPTION

After considering the following description, those skilled in the artwill clearly realize that the teachings of our present invention can beutilized in a wide range of applications for generating andincorporating a pre-defined watermark into nearly any type of computerexecutable code, i.e., a non-marked program (or a portion of it), or forthat matter into any software object, such as an audio file, containing,e.g., music, or a video file, containing, e.g., a movie, that is to bewatermarked. Through this technique, substantial tamper resistance canbe imparted to the marked object itself; is and other executable code,such as a pre-defined executable security code or a pre-definedexecutable, can be tightly integrated into the non-marked object in amanner which assures that a resulting watermarked object, which containsboth the non-marked object and the watermark, is itself substantiallytamper-resistant.

To clearly elucidate the invention and facilitate reader understanding,by way of background, we will first very briefly describe well known,though separate concepts of watermarking and program flow analysis.Thereafter, we will discuss our present invention in the context of itsuse for imparting a watermark, specifically implemented throughpre-defined executable code, into a non-marked application programintended for subsequent execution on a personal computer (PC). Duringthe course of this discussion, we will address the overall constituentsof the watermark and its variations.

A. Overview

1. Watermarking

FIG. 1 depicts a simplified high-level block diagram of conventionalwatermarking process 100. In essence, an object, O, to be protected,whether it be a printed image, a document, a piece of paper currency orsome other such item, is applied, as symbolized by line 105, to markingprocess 110 situated at an originating location. This process creates awatermark and embeds it in the object to create a watermarked object,O′. The watermarked object is then eventually transported throughinsecure channel 115, whether it be, e.g., transit through a publiccarrier or, as in the case of currency, public distribution, to adestination location. In a typical situation, a malicious attacker mayintercept watermarked object O′, as it travels through channel 115, andtry to remove the watermark from O′ by subjecting O′ to his or her owntransformations. Thus, an object, O″ ultimately received at adestination location, may not be exactly the same, as what it was, i.e.,O′, when sent. At the destination, the watermarked object O″ (if a thirdparty attempted to modify the object) or O′ (if the object was notmodified) is subjected to watermark recovery process 130 which attemptsto recover the watermark from the object and, based on a result of therecovery process, indicates, as symbolized by output line 135, whetherthe watermark is present or not in the relevant object. This indicationcan be used to signify whether the received watermark object, thensituated, as symbolized by line 120, at the destination is legitimate ornot. Since the legitimacy of the document is directly governed by thesecurity of the watermark, the watermark itself must be as difficult aspossible for a third party to copy or alter.

2. Program Flow Analysis

FIG. 2 depicts a simplified and illustrative flow graph produced throughconventional program flow analysis.

In essence and to the extent relevant, flow analysis, as symbolized byarrow 215, transforms executable code, such as program listing 210, intoa connection graph, such as graph 220. Each node within the graphrepresents a separate instruction in the code, with typically a firstsuch instruction encountered in the code represented by an uppermost(“root”) node. Here, separate instructions in listing 210 aregraphically represented by nodes 225 having separate illustrative nodes225 ₁, 225 ₂, 225 ₃, 225 ₄, 225 ₅, 225 ₆, 225 ₇, 225 ₈, 225 ₉, ²²⁵ ₁₀and 225 ₁₁. Execution flow from one instruction to another is indicatedby a line (commonly and hereafter referred to as an “edge” or link) thatconnects the two nodes representing these particular instructions. Asshown, instructions represented by nodes 225 ₁, and 225 ₂ are connectedby edge 230 ₁ thereby indicating that execution flow transits betweenthese nodes, e.g., from the instruction represented by node 225 ₁ tothat represented by node 225 ₂. Similarly, nodes 225 ₂ through 225 ₁₁are collectively connected by edges 230 ₂ through 230 ₁₃. The specificnodes and their interconnecting edges, which form a collective pattern,graphically depict the execution flow among the illustrativeinstructions in associated executable code, here represented by nodes225 ₁ through 225 ₁₁ within program listing 210.

Illustrative graph 220 is quite small and greatly simplified. Inactuality, with relatively large application programs, correspondingflow graphs can become extremely large, with a substantial number ofnodes and edges, and graphically depict highly complex execution flows.

Advantageously, the present invention is independent of any particulartechnique through which flow analysis is effectuated and, in alllikelihood, will properly function with any one of a wide variety oftechniques provided that only one such technique is consistently usedthroughout any single implementation of the present invention. Hence,for brevity, we will omit all details of how flow analysis is actuallyperformed to yield a corresponding flow graph for associated executablecode.

In a similar way to the program control flow graph, one can assign adata flow graph to given code. Clearly, our invention can function withany type of flow graph assigned to such code, regardless of whether thatgraph is associated with data and/or control flow.

B. Inventive Software Watermarking Technique

1. Process Overview

FIG. 3 depicts a simplified high-level block diagram of our inventivetechnique for securely watermarking software.

In essence, our present invention produces a watermark, illustrativelycontaining a large number of separate executable routines, that istightly integrated into a flow pattern of non-marked executable code,e.g., an application program, through randomly establishing additionalexecution (control) flows in the executable code and inserting aselected one of the routines along each such flow. Our invention cangenerate and embed such a watermark in substantially any executablecode, such as an application program, an operating system (O/S) or asoftware module. For purposes of illustration, we will assume, as notedabove, this code is an application program for execution on a PC.

Specifically, as shown, unmarked executable code, X, that forms, e.g.,an application program that is to be watermarked, is applied towatermark code generation process 300. Process 300 first generates,through conventional flow analysis tool 310 corresponding flow graphsfor this code. In actuality, this process, as implemented, in a PC (orother computer system) will generate a representation of each suchgraph. For simplicity, the term “graph”, as used herein, is now definedto encompass not only the graph itself but also all digital and otherelectronic representations thereof. As will be described below inconjunction with FIGS. 4 and 5, code X can contain not only thenon-marked application code itself but also other executable code, suchas pre-defined security code, that is to be spliced into the non-markedapplication code.

Once the flow graphs are generated, those graphs are applied towatermark code generator 320. Generator 320 randomly selects pairs ofnodes in both of the flow graphs, inserts an edge (i.e., to establish anexecution flow) between the nodes in each pair and also inserts apre-defined routine (both collectively viewed as an executable“procedure”) (r_(i)) from library 330 (having routines 330 ₁, 330 ₂, . .. collectively labeled as r) at an appropriate one of the nodes in eachsuch pair; thus, ultimately forming a single combined flow graphproduced by generator 320. By this we mean the following. Code segmentsassociated with routine r_(i) are added to the original program X sothat the behaviour of the program in terms of its input and output aswell as functionality are almost the same as the original program.Further, these segments are designed to appear as if they are a normalpart of original code X so that these segments are not readily apparent,unless the attacker expends considerable effort to understand the codeand its behaviour. An illustrative implementation could be that, e.g.,routine r_(i+1) is called by some code to alter data or variables (orcode) in some segment and later after some execution routine r_(i+2) iscalled to reverse this change. Without the reversal, the watermarkedprogram would fail to operate normally. The flow graphs of the code forthe inserted routine may encode additional information which itself mayalso be a part of the watermark. The information in the watermark mayalso be in various components of H′, which is discussed below.

Now, we will describe an exemplary implementation of how one may embedinformation in a flow graph that has a distinguished start node andwhere the nodes are canonically labeled using some scheme (e.g., byintegers, short binary strings or elements from a finite field). Fromthe distinguished node, one may traverse the graph using any one of manyconventional traversal schemes which visit all the nodes in apredetermined sequence. Illustrative ways of doing so include depthfirst search and breadth first search. Here, we may modify nodalordering using a pseudo-randomized strategy. For example, where thetraversal is at a particular node and when deciding which node to visitnext, some possible candidates may be pseudo randomly selected foromission. This provides some resilience inasmuch as an attacker changingthe code and the associated flow graph may not know which candidates areomitted and included. Further, the non-omitted candidates may be pseudorandomly ordered and then visited in that order. Any such traversal willyield a sequence (order) of nodes visited, and their nodal labels inthat order will yield a parameter, i.e., a specific code (so as not toconfuse the reader, the term “code” here is being used in acryptographical sense and not referring to software code) thatrepresents the graph. We may use more than one traversal. The parameteris likely to be distinct for each graph and its canonical labeling. Tofurther enhance the resilience, an error correction procedure may beapplied over this parameter, for example, using majority logic decodingprocedures or algebraic coding procedures. In the latter case, thegraphs and the labeling, and the pseudo-random choices may need to bepre-configured so that the error correction may be applied meaningfully.Then, the output of a decoding process (or a decryption process using acipher that employs a key derived from the watermark key) would be amessage, decrypted using a secret key that is embedded in the graph. Thedistinguished node may not be easily identifiable by an attacker whodoes not know the secret, whereas given the secret key one can readilyidentify the distinguished node. In particular, one method to accomplishthis would be that while in the detection phase one tries each possiblecandidate node and tries to find an associated code of a subgraph ofcertain fixed size that contains that node as a root. If a resultingderived parameter appears meaningless or an attempt to recover theparameter fails for some reason, one concludes there are no watermarks;else, we conclude that there is a watermark. Examples of such parameters(cryptographic codes), which are well known in coding theory aredescribed in, e.g., J. H Van Lint, Introduction to Coding Theory, (©1998, Springer Verlag).

Furthermore, any pseudo-random number generator needs a secret randomkey which may be derived from a key used for watermarking using any oneof many conventional ways. For example, to derive numeric key k₁ fromsecret key S, we can use k₁=HASH(S,1), where HASH may be a collisionresistant hash function such as MD5 or SHA1. For a description of thesefunctions, the reader is referred to A. J. Menezes et al, Handbook ofApplied Cryptography,(© 1997; CRC Press, LLC).

The watermark is collectively defined by the routines and edges (thelatter collectively implementing a pattern of associated executionflows) that have been added to the unmarked code and, where used, anycryptographic parameter that is detected from corresponding graphs.

Thereafter, generator 320, using the unmarked code and code for routinesr, assembles the marked executable code, for the program, from thatrepresented by the combined flow graph. The resulting marked executablecode is watermarked code X′, here being a watermarked applicationprogram.

Since the flow pattern of the watermark is highly intertwined with(tightly spliced into) the flow pattern of the non-marked code, thewatermark is effectively impossible to either remove from thewatermarked code and/or circumvent. Furthermore, since the code for theroutines themselves is added in such a manner that the flow pattern ofresulting watermarked code is not substantially different from that ofthe non-marked code, the watermark is also extremely difficult for thethird party adversary to discern using, e.g., standard flow analysistools.

To enhance tamper-resistance of the watermarked code, each routine, thatconstitutes a portion of the watermark, can provide, as discussed above,a pre-defined function such that if that routine were to be removed fromthe marked application code by, e.g., a third party adversary, then thatapplication will prematurely terminate its execution.

Each of these routines is pre-defined, as noted above. They are usuallyquite compact in size (i.e., contain a relatively small number ofinstructions) and executes a pre-defined “small”, often self-containedoperation (task), such as, e.g., computing a cryptographic key for usein printing or decoding a variable, decrypting a ciphered variable,encrypting a variable or data value prior to either its use or storage,and/or validating either a stored parameter or an input value. Each ofthese operations is chosen so as not to require much processing time;thus, not noticeably degrading execution of the application programitself. Multiple routines could be inserted to provide a combinedfunctionality. In that regard, one such routine can be inserted in theprogram flow to encrypt a plaintext data value into an enciphered value,and another routine could be inserted into that flow at an appropriatelocation to decrypt the enciphered value immediately before theapplication program needs to use the plaintext data value. Collectively,the routines are preferably such that, for proper execution of themarked program, they must all be executed and, to a certain extent, in agiven sequence. In that regard, if any one or more of these routines isremoved from the watermarked application program, such as by a thirdparty adversary, then that program will gracefully terminate itsexecution.

To further frustrate its detection, the code for all the insertedroutines is collectively scattered approximately uniformly throughoutthe watermarked application program as generator 320 constructs thatprogram from the combined flow graph. In this manner, the routines willnot be centralized in any one portion of the watermarked program.Furthermore, each of these routines is written with conventionalstandard code “obfuscation” techniques to further camouflage theirfunctionality.

2. Detailed Process Depiction

With the above in mind, FIG. 4 depicts a detailed block diagram of ourinventive technique, with FIG. 5 depicting table 500 of illustrativecode portions that can be applied as separate inputs to our technique.To simplify understanding, the reader should simultaneously refer toboth of these figures throughout the following discussion.

Non-marked executable code, e.g., an application program, that is to bewatermarked through process 300 is split into two portions andseparately applied as input code portions G and H on input lines 403 and407, respectively. These portions, as shown in table 500, canillustratively comprise two identical images of the same non-markedcode, two different halves of the same non-marked code, and a completeimage of that code and the security code. Since the partitioning of thenon-marked code is not critical, other fractional partitioning of thenon-marked code can be used instead, such as illustratively ¼ and ¾ ofthe image for portions G and H, respectively, and with or withoutinclusion of the security code. The security code can also containspecific “watermark” code, e.g., executable code distinct from thenon-marked application program, which, as its primary, if not sole,function, forms a portion of the watermark (or, as described below,provides other functionality totally apart from that of the applicationprogram). The security code can also be duplicated or fractionallypartitioned, as desired, across both code portions G and H.

Within process 300, code portions G and H are applied to flow analyzer310 and specifically subjected to corresponding flow analysis operations413 and 417, to yield corresponding flow graphs {tilde over (G)} and{tilde over (H)}, respectively. Both flow analysis operations employ thesame conventional flow analysis technique. Both of these flow graphs arethen routed to partitioner 420 where flow graphs {tilde over (G)} and{tilde over (H)} are k-partitioned, via corresponding k-partitioningoperations 423 and 427, to yield cluster flow graphs G′ and H′,respectively. A k-partitioning algorithm decomposes its input into kpieces where each piece has approximately a same number of nodes, and anumber of edges between the pairs of pieces is as small as possible andapproximately the same. It can be heuristically assumed that a graphover the partitioning has a fairly rich structure, for example, as arandomly generated graph having a same number of nodes and edges. Infact, any partitioning of the original graph will suffice for use withour invention, with that described herein being only one such example.Both of the k-partitioning operations are identical and can beimplemented by any one of a wide variety of conventional partitioningtechniques. Each of the resulting cluster flow graphs contains kclusters (each cluster being a “partition”) of nodes where k is apre-defined “secret” value (illustratively 1000 for a large applicationprogram having tens of Mbytes of instructions though the exact value ofk is not critical) provided, as symbolized by line 421, as input toprocess 300. Partitioning the flow graphs in this fashion renders theresulting cluster flow graphs more manageable over flow graphs {tildeover (G)} and {tilde over (H)} while increasing complexity of thecluster graphs over that of the non-partitioned flow graphs. Generally,cluster size (inverse of k) is selected such that interaction betweendifferent clusters, during program execution as shown by thecorresponding flow graphs, is relatively low. An identical cluster sizeis used by partition operations 423 and 427.

Resulting cluster flow graphs G′ and H′ are then routed, as symbolizedby lines 433 and 437, as input to edge definition operation 440. Thisoperation establishes edges between nodes in different clusters ingraphs G′ and H′, and, where desired, different clusters solely in graphG′ and solely in graph H′. To so do, operation 440 randomly selects,based on secret input parameter λ (where 0≦λ≦1), M pairs of nodes,collectively spanning the desired clusters, and inserts an edge linkingthe nodes in each such pair, where M is also a “secret” value (typicallya large pre-defined integer, such as illustratively 500,000 for a largeapplication program having on the order of tens of Mbytes ofinstructions though the exact value of M is not critical). Anillustrative algorithm for selecting these nodes, based on λ, will bedescribed below in conjunction with FIG. 10. These nodal pairs willinclude those pairs having one node in a cluster in graph G′ with theother node in a cluster in graph H′ and, if desired, those pairs withboth nodes in different clusters solely within graph G′ and/or bothnodes solely within different clusters within graph H′.

As each edge is established by operation 440, designations of the nodesassociated with that edge are applied both to watermark generator 450and to routine selector 470. Selector 470, given the location of each ofthese nodes within the execution flow of the application program, willselect an appropriate routine from library 330 (r), specificallyroutines r₁ (also designated 330 ₁), . . . , routine r_(n) (alsodesignated 330 _(n)) that is to be inserted along that edge and aspecific node on that edge at which that routine is to be inserted. Theroutines so inserted alter the data or variables locally and create datadependencies that can not be easily analyzed with a data flow analysisprogram. For example, routine r₁ inserted in G′ may place a call (andcause data dependencies using a random looking but efficiently invertedoperation) to a copy of routine r₂ inserted in H′ where this call isassociated with an edge inserted, as described below, from G′ to H′. Avariable altered in r₂ may be subject to a transformation that undoesthat alteration, but also in an easily inverted but random lookingoperation. For example, routine r₃ may compute a check-sum of apre-defined code segment and write that sum into a variable in anothersegment where, e.g., routine r₅ is inserted (routine r₃ may also checkit against a given value to see if the pre-defined code segment has beenaltered). A label (or other programming designation) of the selectedroutine is applied, as symbolized by line 475, by selector 470, to oneinput of watermark generator 455. The selected routine and the specificnode at which that routine is to be inserted are such that the insertedroutine will impart minimal, if any, adverse affect on execution of theapplication program at that point in the program flow. In addition,since there are likely to be far fewer routines in library 330 thanthere are edges inserted into graphs G′ and H′, each routine withinlibrary 330 is selected approximately the same number of times forinsertion into the non-marked application program.

Generator 450, in addition to receiving, as input, the label of theselected routine, also receives, as symbolized by line 445, for eachinserted edge, identification of the nodes for that edge, and, assymbolized by lines 433 and 437, cluster flow graphs G′ and H′. Withthis input information, generator 450 inserts the edge into cluster flowgraphs G′ and H′ (or either one of these graphs as appropriate) to spanthe nodal pair and also inserts the selected routine at one of these twonodes. The insertion process is simplistically represented by combiner455. All the edges collectively and effectively splice the two clusteredflow graphs G′ and H′ together into a single combined flow graph J whichappears at an output of generator 450 and is routed, as symbolized byline 457, to code generator 460. Executable code is then produced, bygenerator 460, which corresponds to that depicted in the combined flowgraph J. Generator 460 is conventional in nature. In essence, generator460, in response to the actual code for program portions G and H,appearing on respective lines 403 and 407, and the code for all theselected routines appearing on line 485, completely assembles executablecode to fully implement the flow depicted in combined flow graph J andthus produce, on output line 490, a new, though marked, executableversion, X′, of the application program. To frustrate third partydetection of any of the inserted routines, generator 460 distributes theassociated code for these routines approximately uniformly throughoutthe marked program such that the flow patterns for the marked andnon-marked versions of the application program do not substantiallydiffer from one another.

As noted above, the watermark is collectively defined by the routinesand edges (the edges collectively implementing a pattern of associatedexecution flows) that have been added to the unmarked code and, whenused, any parameter (cryptographic code) that is detected fromcorresponding graphs using, e.g., the procedures (e.g., with or withouterror correction) described above. By virtue of the manner through whichthe edges and routines are inserted as well as a relatively large numberof such inserted edges, the resulting flow pattern for the markedprogram is sufficiently complex such that the watermark can not bereadily discerned by a third party adversary.

To fully appreciate resulting execution (control) flow among variousinserted routines, consider FIG. 6. This figure depicts illustrativeedge and associated routine insertion in cluster flow graphs G′ and H′,and resulting control flow among various inserted routines. Forsimplicity, this figure only shows one inserted edge; though, inactuality, a considerable number of edges will be inserted in anon-marked application program.

Graphs G′ (also denoted 610) and H′ (also denoted as 620) containillustrative clusters 613, 615 and 617; and 623, 625, 627 and 629,respectively. For simplicity, only a few of the clusters in each graphare explicitly shown. Each of these clusters, excluding the code for theassociated routine, represents a relatively large block of code in thenon-marked application program. Specifically, clusters 623, 625 and 613represent code blocks P₁, P₂ and P₃, respectively.

Assume that watermark generator 450 (see FIG. 4) has inserted, as shownin FIG. 6, routine r₁ (also denoted as 330 ₁) within code block P₃, andthat routines t₁ and t₂ located in code blocks P₁ and P₂, respectively,are preexisting within the non-marked application program. Executionwould normally flow in the non-marked application program itself,without any diversion, via edge 650, just from routine t₁ directly toroutine t₂.

Edge definition operation 440 (see FIG. 4) has inserted edge 630, asshown in FIG. 6, that spans a nodal pair having separate selected nodeswithin G′ and H′ and specifically one node, in that pair, within codeblock 613 and another node, in that pair, in code block 623. Here, as istypical, executable code for routine r₁ is inserted at a node, in thepair and in block P₃, remote from the other node, in the pair andlocated in block P₁, situated within a normal execution flow of thenon-marked application program. As such, execution will flow fromroutine t₁ in block P₁, via edge 630 and in a direction indicated bydashed line 635, to routine r₁ in block P₃ and, once this routine hascompleted its execution, will return, along this edge and in a directionshown by dashed line 640, back to block P₁. From there, execution willfollow its normal flow, within the application program and specificallyvia edge 650 and in a direction shown by dashed line 655, to routine t₂in block P₂. Thus, as one can appreciate, execution flow of thenon-marked application program, as modified by our inventive techniqueto yield the watermarked program, will be repeatedly diverted from itsnormal path, i.e., that associated just with the non-marked program, toexecute each of the inserted routines.

To further thwart detection of any inserted routine (such as routines r₁through r₄), executable code that implements that routine (includingappropriate jump instructions) can itself be diffused (scattered in anoncontiguous fashion, i.e., in noncontiguous locations) throughout anentire corresponding code block for a cluster, or even across multipleblocks for multiple clusters, in the non-marked application programrather than contiguously in an address space associated with or appendedto a single block. Such scattering need not occur on an equal rate(e.g., two instructions of the inserted routine per every 200instructions of the non-marked application program) throughout theblock.

3. Hardware

FIG. 7 depicts a block diagram of PC 700 on which our present inventioncan be implemented.

As shown in FIG. 7, PC 700 comprises input interfaces (I/F) 720,processor 740, communications interface 750, memory 730 and outputinterfaces 760, all conventionally interconnected by bus 770. Memory730, which generally includes different modalities, includingillustratively random access memory (RAM) 732 for temporary data andinstruction store, diskette drive(s) 734 for exchanging information, asper user command, with floppy diskettes, and nonvolatile mass store 735that is implemented through a hard disk, typically magnetic in nature.Mass store 735 may also contain a CDROM or other optical media reader(not specifically shown) (or writer) to read information from (and writeinformation onto) suitable optical storage media. The mass store storesoperating system (O/S) 737 and application programs 739; the latterillustratively containing watermarked code generation procedure 800(which incorporates our inventive technique) and routine library 330 (r)(see FIG. 5). O/S 737, shown in FIG. 7, may be implemented by anyconventional operating system, such as the WINDOWS NT operating system(“WINDOWS NT” is a registered trademark of Microsoft Corporation ofRedmond, Wash.). Given that, we will not discuss any components of O/S737 as they are all irrelevant. Suffice it to say, that applicationprograms 739 execute under control of the O/S.

Incoming information can arise from two illustrative external sources:network supplied information, e.g., from the Internet and/or othernetworked facility, through network connection 755 to communicationsinterface 750, or from a dedicated input source, via path(es) 710, toinput interfaces 720. Dedicated input can originate from a wide varietyof sources, e.g., via a dedicated link or an external source. Inaddition, input information can also be provided by inserting a diskettecontaining an input file(s) (G and H) of non-marked code and, whereapplicable, security code portions into diskette drive 734 from whichcomputer 700 will access and read that file(s) from the diskette. Inputinterfaces 720 contain appropriate circuitry to provide necessary andcorresponding electrical connections required to physically connect andinterface each differing dedicated source of input information tocomputer system 700. Under control of the operating system, applicationprograms 739 may exchange commands and data with the external sources,via network connection 755 or path(es) 710, to transmit and receiveinformation, to the extent needed if at all, during program execution.

Input interfaces 720 also electrically connect and interface user inputdevice 795, such as a keyboard and mouse, to computer system 700.Display 780, such as a conventional color monitor, and printer 785, suchas a conventional laser printer, are connected, via leads 763 and 767,respectively, to output interfaces 760. The output interfaces providerequisite circuitry to electrically connect and interface the displayand printer to the computer system. As one can appreciate, our presentinventive software watermarking technique can be used to watermark anytype of code regardless of the modalities through which PC 700 willobtain, store and/or communicate that code.

Furthermore, since the specific hardware components of PC 700 as well asall aspects of the software stored within memory 735, apart from thevarious software modules, as discussed below, that implement the presentinvention, are conventional and well-known, they will not be discussedin any further detail.

4. Software

FIG. 810 collectively depict high-level flowcharts of salient softwareprocedures (modules), which execute on PC 700, for implementing ourpresent invention, with specifically FIG. 8 depicting a high-levelflowchart of watermarked code generation procedure 800. This processimplements the process provided by watermarked code generation process300 shown in FIG. 4. For ease of understanding, the reader shouldsimultaneously refer to both FIGS. 4 and 8 throughout the followingdiscussion.

Upon entry into procedure 800, execution first proceeds to block 810.This block, when executed, reads input values of secret parameters k, Mand λ. Thereafter, execution proceeds to block 820 which reads inputcode portion G and then performs flow analysis on that portion to yieldcorresponding flow graph {tilde over (G)}. Once this flow graph is fullyproduced, then execution proceeds to block 830 which reads input codeportion H and then performs flow analysis on that portion to yieldcorresponding flow graph {tilde over (H)}. Blocks 820 and 830 implementflow analysis operations 413 and 417, respectively. After cluster flowgraph {tilde over (H)} is fully produced, block 840 is executed tok-partition flow graph {tilde over (G)} into cluster flow graph G′.Thereafter, block 850 is executed to k-partition flow graph {tilde over(H)} into cluster flow graph H′. Blocks 840 and 850 implementk-partitioning operations 423 and 427, respectively. Once block 850fully executes, execution then proceeds to block 860.

Block 860, when executed, invokes edge insertion procedure 900 (to bediscussed below in conjunction with FIGS. 9A and 9B) to randomly insertM separate edges and routines collectively between clusters in graphs G′and H′, and, if desired, different clusters solely within each of graphsG′ and H′, to yield single combined flow graph J. This procedure, whichimplements edge definition operation 440, provides: random selection ofnodes in both and, where desired, either one of the cluster flow graphsG′ and H′ to form nodal pairs; insertion of edges to connect each suchnodal pair; and selection of a proper routine for insertion along theedge defined by each nodal pair and selection of a particular node inthat pair at which that routine is to be inserted.

Once procedure 900 fully executes, execution proceeds to block 870,which implements code generator 460. This block, using input codeportions G and H, and library 330, constructs executable codecorresponding to the representation depicted in combined flow graph J.In addition, block 870 distributes, as noted above, the code for theindividual inserted routines approximately uniformly throughout theexecutable code. Once this executable code is fully generated, executionpasses to block 880 which simply provides this code as outputwatermarked code. Thereafter, execution exits from procedure 800.

FIGS. 9A and 9B collectively depict a high-level flowchart of edgeinsertion procedure 900 that is executed as part of procedure 800 shownin FIG. 8; the correct alignment of the drawing sheets for FIGS. 9A and9B is shown in FIG. 9.

As shown in FIGS. 9A and 9B, upon entry into routine 900, executionfirst proceeds to block 905 which, when executed, initializes contentsof counter i to one. Thereafter, execution proceeds to a loop formed ofblocks 910 through 960 to: randomly select nodes that are to form nodalpairs; select a proper routine for insertion at an edge defined by eachsuch nodal pair and select a particular node in that pair at which thatroutine is to be inserted; and connect the nodes in each pair togetherto form M separate edges.

In particular, upon entry into this loop, execution first proceeds toblock 910. This block, when executed, selects a specific flow graph(s)to which edge(i) is to be added such that edges (links) are added to allsuch graphs in accordance with corresponding pre-defined distributions.Illustratively, such distributions may call for approximately the samenumber of edges to be inserted that span both cluster flow graphs as arecollectively contained solely within graph G′ and within graph H′, or adifferent allocation of edges between and within each of graphs. Thespecific distributions that are used, in any one instance, are notcritical provided that an adequately large number of edges is insertedthat span both cluster flow graphs G′ and is H′ so as to yield asufficiently complex flow pattern that thwarts third party detection ofthe complete watermark. Moreover, since all edges are effectively chosenat random, then the edges can be inserted in essentially any orderbetween both flow graphs and, where appropriate, solely within either ofthe flow graphs. Once the appropriate graph(s), such as both G′ and H′or solely G′ or solely H′, are selected for edge(i), then executionproceeds to decision block 915. This block appropriately routesexecution based on the selected graph(s). In particular, if edge(i) isto be inserted solely within graph G′, then decision block directsexecution, via path 916, to block 925. Block 925 selects, typicallyrandomly, two nodes in graph G′, though with each node in a differentcluster, as endpoints for edge(i). Alternatively, if edge(i) is to beinserted that spans both graphs G′ and H′, then decision block directsexecution, via path 918, to block 930. Block 930 selects, typicallyrandomly and illustratively through invoking node selection procedure1000 (to be discussed below in conjunction with FIG. 10), two nodes, onein graph G′ and the other in graph H′ as endpoints for edge(i). Lastly,if edge(i) is to be inserted solely within graph H′, then decision blockdirects execution, via path 922, to block 935. Block 935 selects,typically randomly, two nodes in graph H′, though with each such node ina different cluster, as endpoints for edge(i). In any instance, a nodecan be common to two inserted edges. Thereafter, after block 925, 930 or935 executes, execution proceeds, via corresponding path 936, 938 or942, to block 945. Also, once appropriate nodal pairs have beenselected, in the manner described above, further pairs can be selectedby randomly picking a pair of nodes from a union of the graphs G′ and H′to yield further edges.

Block 945 selects, for edge(i) and particularly with respect to thelocation of each node in this edge in the corresponding program flow, anappropriate routine from library 330 (see FIG. 3) that is to be insertedalong this edge. Once this routine is selected, execution proceeds toblock 950, as shown in FIGS. 9A and 9B. This block, when executed,connects the nodal pair for edge(i) together to effectuate this edge,and selects one of these two nodes in one flow graph as an insertionpoint for the selected routine, such that this routine causes minimal,if any, adverse affect on execution flow. Typically, where an insertedroutine is to be invoked when program execution reaches one node in anodal pair, a call (or jump instruction) to that routine will be locatedat that node but with executable code for that routine being situated atthe opposite node in that pair. Once block 950 fully executes, executionproceeds to decision block 955 which determines, based on currentcontents of counter i, whether all M edges have been defined.Specifically, if the contents have a value less than M, then decisionblock 955 routes execution, via NO path 957, to block 960. This latterblock increments the contents of the counter by one. Thereafter,execution loops back, via path 965, to block 910 to select appropriateflow graph(s) for the next edge, and so forth. Alternatively, if thecurrent value of counter i equals M, then all M edges have beengenerated; hence, decision block 955 routes execution, via YES path 959,to block 970. This latter block, when executed, provides, as output,single combined flow graph J that through the addition of edges androutines has effectively spliced together graphs G′ and H′. Thereafter,execution simply exits from routine 900.

FIG. 10 depicts a flowchart of node selection procedure 1000, which isexecuted as part of procedure 900 shown in FIG. 9. This procedureimplements an illustrative heuristic for selecting two nodes in a nodalpair that is to span graphs G′ and H′.

Upon entry into routine 1000 shown in FIG. 10, execution first proceedsto block 1010. This block, when executed, randomly picks a node, U, incluster flow graph G′. Thereafter, execution proceeds to block 1020.Here, with λ being pre-defined as equaling (a number of edges that areto transit between G′ and H′)/(a number of edges connected to U), block1020 randomly selects, with a probability of 1λ, a node, Y, in graph H′.Then, block 1030 randomly chooses, with a probability of λ, a node, Zother than U, in graph G′. Once this occurs, block 1040 executes toprovide, as output, the designations of two nodes Y and Z as a nodalpair. Thereafter, execution exits from procedure 1000.

Nodal pairs may be selected using any one of many different ways, apartfrom that shown in FIG. 10 and discussed above. Another such wayinvolves using bipartitie graphs. In particular, we note here that thegraphs G′ and H′ may have clusters which may be unequal in number. Fromgraph G′, pick a pre-determined number of nodes, for each cluster, whichneed not be all equal. Let the number of nodes be L. Then, similarlypick L′ (often a pre-determined number, but possibly given by aprobabilistic distribution) nodes from graph H′. Now, construct a randombipartitie graph with the L and L′ nodes as independent sets and with anaverage degree of each node being small and pre-determined. Then, addedges among the nodes in both of the independent sets using any standardmethod of random graph generation, once again keeping the average degreesmall and, for example, comparable to the average degree of the nodes inthe original graphs G′ and H′.

The number of edges that connect to U is implied by the process above.Generally speaking, the total number of edges to be added is selectedbased, generally qualitatively, on how much tamper-resistance is desiredwhile not making code growth too large or a performance penalty toosignificant. Thus, one must avoid inserting edges into busy sections ofunmarked code (e.g., inner body of a loop). Fortunately, such “busy”sections can be readily detected using standard software profilingtechniques.

Though we described our invention in terms of watermarking softwareintended for use with a PC, our invention is clearly not so limited inits application. In that regard, the present invention can be used toimpart tamper-resistance to any program code designed to execute onnearly any type of computer, not just a PC. Such other types ofcomputers illustratively include workstations, minicomputers, mainframecomputers, and specialized computers such as industrial controllers andtelephony switching system computers.

The various secrets, e.g., the values k, M and λ, as described above,for detecting the watermark may be contained in a secure processor thathas secure memory (whose contents can not be altered or inferred) and iscapable of secure execution (i.e., it executes given code, in such a waythat its execution cannot be altered). Alternately, this processing canbe implemented using O/S features in software. Another possibility isthat secure memory and secure execution can be simulated in softwaresuch that the protected object will be self-enforcing.

While we have described the inserted routines as providing functionality(operations) required by the protected program and terminating thatprogram if any one such routine were to be removed by a third partyadversary, the routines are clearly not so limited in their use. In thatregard, one or more of the inserted routines can provide functionalitytotally unrelated to, having no affect on and completely separate fromthat provided by the protected program. Such functionality can includerestricting access to the protected program or other protected object.See, e.g., that described in co-pending United States patent application“Passive and Active Software Objects Containing BoreresistantWatermarking”, Ser. No. 09/315,733, filed May 20, 1999 and assigned tothe same assignee hereof, which is incorporated by reference herein.

Use of the present invention advantageously permits a standalonefeature, such as a general security function, as implemented by one ormore of the routines, to be easily and securely integrated into anynon-marked computer program. In fact, many different functions could beadded to a single non-marked program.

For example, if the watermark program is an application program designedto execute under a certain operating system(s), then one insertedroutine could establish a network connection over, e.g., the Internet,and then transmit, over that connection, a hardware identifier for auser computer to a manufacturer of an operating system then executing onthat computer; and then, in return, receive, from the softwaremanufacturer and store in, e.g., and O/S registry, an install identifiercontaining a signed version of the machine identifier. Another suchroutine could subsequently access the hardware identifier and comparethat identifier to the one contained in the install identifier. If bothmatched, program execution would proceed unaffected under the O/S.Alternatively, if a mismatch were detected, such as would arise if animage of the O/S were to have been copied to and is now executing onanother computer (which would inherently contain a different hardwareidentifier), then the routine could inform the O/appropriately anddisplay a warning message and instruct the O/S to cease furtherexecution of any application programs.

Clearly, a myriad number of other standalone security functions and/orother nonsecurity features could be implemented by correspondingroutine(s) inserted into non-marked executable code through ourinventive watermarking technique. By binding these routines very tightlyto the non-marked program and with randomized insertion into andthroughout the program, these features will collectively exhibit anextremely complex control flow. This flow complexity will significantlyfrustrate a third party adversary, who uses conventional softwareanalysis tools, in detecting and circumventing all the functions andfeatures implemented by the inserted routines.

In addition, though we have described our inventive technique asutilizing two separate input code portions and generating a partitionedflow graph for each such portion, our inventive technique, with readilyapparent simplifications, will function in those situations where onlyone such input code portion is used. In this case, edges andaccompanying routines would be inserted along edges defined by randomlyselected nodal pairs spanning different (and/or even the same) clustersin just one partitioned flow graph for that input code portion. Theresult would be a watermark that is likely to be simpler, and henceeasier for an adversary to discern and thus less optimal, than thatattained through use of two separate input code portions. Furthermore,our inventive technique could also be used to insert edges and routinesamong more than two separate input code portions, e.g., three or more,if desired. However, the complexity of the overall processing toaccomplish this result may be excessive for the marginal benefits thatcould be gained. In that regard, we believe that use of two separateinput code portions will suffice to create a watermark that issufficiently immune to adversarial detection and circumvention throughuse of conventional flow analysis techniques.

Moreover, though the flow graphs are described as showing control/dataflow on an instruction-based level for an unmarked program and henceinsertion of executable routines at an “instructional” level, wheresecurity restraints can be considerably loosened, such flow graphs canbe based on control/data flow that occurs at a higher level, such as ona block-by-block basis, with each node representing a routine or otherexecution block formed of a group of instructions, with insertion ofsuch routines occurring at that level.

Furthermore, though we have described our invention in the context ofsoftware-implemented objects, whether in the specific context, asdescribed above, of executable software code or as noted above, a dataobject, such as, e.g., an audio or video file, these objects can also behardware-related. In that regard, a design of an integrated circuit,such as a dedicated logic chip or other digital processing device thatembodies a predefined operational flow, whether it be in a parallel,pipelined or sequential fashion, in which resulting operations andaccompanying control/data flow can be graphically depicted as nodes withinterconnection edges, can be viewed as such an object and henceencompassed within this term. Through use of a suitable set of libraryroutines, which, in a similar fashion to library of routine(s) discussedabove, implement added functionalities and randomized edge additionprocedures, such as those discussed above, a resulting graph will begenerated such that, from a standpoint of input/output functionalitywhich the chip or device is intended to implement, is functionallyequivalent thereto but advantageously of significantly increasedcomplexity and hence significantly increased difficulty to decipher fromanalysis of the input/output behaviour of the chip or device. Theresulting graph would contain a watermarked object generated through useof our invention.

Although one embodiment which incorporates the teachings of the presentinvention has been shown and described in detail herein, those skilledin the art can readily devise many other embodiments, modifications andapplications of the present invention that still utilize theseteachings.

1. An apparatus for forming an identifier for an input object and forsecurely marking the input object with the identifier so as to yield amarked object, the apparatus comprising: a processor; and a memoryhaving computer executable instructions stored therein; and wherein theprocessor, in response to the stored executable instructions: generatesa flow representation for the input object, the representation having aplurality of nodes, said nodes representing predefined first operationsperformed by the input object, and connections among the nodessignifying associated flow among the predefined first operationsperformed by the input object; partitions the flow representation intok-clusters each so as to yield a cluster flow representation, at leastone of the k-clusters including two or more of the plurality of thenodes; randomly selects first and second nodes from the plurality ofnodes in the cluster flow representation so as to form a pre-definednumber of nodal pairs, each of said pairs having one of the first nodesand a corresponding one of the second nodes; wherein the processor, inresponse to the stored instructions, randomly selects the first andsecond nodes from different clusters within the cluster flowrepresentation; and for each of the nodal pairs, establishes flowbetween the first and second nodes in said each nodal pair and inserts,in the flow so established, a selected one of a plurality of differentpre-defined second operations so as to collectively define the markedobject, whereby the marked object implements the predefined firstoperations and a plurality of selected ones of the predefined secondoperations, each of which has been randomly spliced into flow of theinput object, wherein the identifier collectively comprises all thedifferent ones of the plurality of predefined second operations, and theassociated execution flow associated therewith and involving the nodalpairs.
 2. The apparatus in claim 1 wherein the input object comprises asoftware object.
 3. The apparatus in claim 2 wherein the software objectcomprises, input executable code, at least one instruction in the inputexecutable code is associated with a corresponding one of the predefinedfirst operations, and executable code for a corresponding executableprocedure is associated with each selected one of the predefined secondoperations.
 4. The apparatus in claim 3 wherein the processor, inresponse to the stored instructions: inserts a pre-defined number ofseparate links and designations for the selected ones of the proceduresinto the cluster flow representation so as to yield a combined flowrepresentation; and converts, in response to said input executable codeand executable code for said selected ones of the procedures, saidcombined flow representation into output executable code, said outputexecutable code being the marked code.
 5. The apparatus in claim 4wherein the input executable code comprises first and second portionsthereof and the flow representation comprises first and second separateflow representations for the first and second portions of the inputexecutable code, respectively.
 6. The apparatus in claim 1 wherein theprocessor, in response to the stored instructions, inserts executablecode for the selected one procedure in noncontiguous locations in theinput executable code.
 7. The apparatus in claim 1 wherein theprocessor, in response to the stored instructions, selects the procedurefrom a pre-defined library of stored routines, wherein said procedure isone of the stored routines.
 8. The apparatus in claim 7 wherein each ofthe inserted procedures implements, when executed, a pre-definedfunction such that if any of said inserted procedures is removed fromthe marked code, the marked code, when subsequently executed, willterminate its execution.
 9. The apparatus in claim 7 wherein at leastone of the inserted procedures implements, when executed, a pre-definedfunction which is independent of functionality provided by thenon-marked software object.
 10. For use with a computer system having aprocessor and a memory, the memory having computer executableinstructions stored therein, a method for forming an identifier forinput executable code and for securely marking the input executable codewith the identifier so as to yield marked code, the method comprisingthe steps of: generating a flow representation for the input executablecode, the representation having a plurality of nodes, said nodesrepresenting first operations in the input executable code, andconnections among the nodes signifying associated control flow amongfirst operations in the executable code; partitioning the flowrepresentation into k-clusters each so as to yield a cluster flowrepresentation, at least one of the k-clusters including two or more ofthe plurality of the nodes; randomly selecting first and second nodesfrom the plurality of nodes in the cluster flow representation so as toform a pre-defined number of nodal pairs, each of said pairs having oneof the first nodes and a corresponding one of the second nodes; whereinthe randomly selecting comprises randomly selecting the first and secondnodes from different clusters within the cluster flow representation;and for each of the nodal pairs, establishing execution flow between thefirst and second nodes in said each nodal pair and inserting, in theexecution flow so established, executable code for a selected one of aplurality of different pre-defined second operations so as tocollectively define the marked code, whereby the marked code containsthe input executable code and a plurality of different ones of thepre-defined second operations each of which has been randomly splicedinto control flow of the input executable code, wherein the identifiercollectively comprises the executable code, for all the different onesof the plurality of predefined second operations, and the associatedexecution flows associated therewith and involving the nodal pairs. 11.The method in claim 10 wherein the input executable code comprises asoftware object.
 12. The method in claim 11 wherein the software objectcomprises input executable code, at least one instruction in the inputexecutable code is associated with a corresponding one of the predefinedfirst operations and executable code for a corresponding executableprocedure is associated with each selected one of the predefined secondoperations.
 13. The method in claim 12 wherein the establishing andinserting step comprises the steps of: inserting a pre-defined number ofseparate links and designations for the selected ones of the proceduresinto the cluster flow representation so as to yield a combined flowrepresentation; and converting, in response to said input executablecode and executable code for said selected ones of the procedures, saidcombined flow representation into output executable code, said outputexecutable code being the marked code.
 14. The method in claim 13wherein the input executable code comprises first and second portionsthereof and the flow representation comprises first and second separateflow representations for the first and second portions of the inputexecutable code, respectively.
 15. The method in claim 10 furthercomprising the step of inserting executable code for the selected oneprocedure in noncontiguous locations in the input executable code. 16.The method in claim 10 further comprising the step of selecting theprocedure from a pre-defined library of stored routines, wherein saidprocedure is one of the stored routines.
 17. The method in claim 16wherein each of the inserted procedures implements, when executed, apre-defined function such that if any of said inserted procedures isremoved from the marked code, the marked code, when subsequentlyexecuted, will terminate its execution.
 18. The method in claim 16wherein at least one of the inserted procedures implements, whenexecuted, a pre-defined function which is independent of functionalityprovided by a non-marked software object.
 19. A computer readable mediumhaving computer executable instructions stored therein for performingthe steps of claim
 10. 20. Executable computer code embodied on one ormore computer-readable media and securely marked with an identifier andgenerated by a computer system, the system having a processor and amemory, the memory having computer executable instructions storedtherein, characterized by the code having being produced by the steps,implemented by the processor in response to the executable instructions,recited in claim
 10. 21. For use with a computer system having aprocessor and a memory, the memory having computer executableinstructions stored therein, a method for forming an identifier forinput executable code and for securely marking the input executable codewith the identifier so as to yield marked code, the method comprisingthe steps of: generating first and second separate flow representationsfor the input executable code, the input executable code including firstand second portions thereof and the first and second separate flowrepresentations corresponding to the first and second portions of theinput executable code, respectively, the flow representations eachhaving a plurality of nodes, said nodes representing first operations inthe input executable code, and connections among the nodes signifyingassociated control flow among first operations in the executable code;partitioning each of the flow representations into k-clusters each so asto yield a cluster flow representation, at least one of the k-clustersincluding two or more of the plurality of the nodes; randomly selectingfirst and second nodes from the plurality of nodes in the cluster flowrepresentation so as to form a pre-defined number of nodal pairs, eachof said pairs having one of the first nodes and a corresponding one ofthe second nodes, wherein the first and second nodes are randomlyselected from different clusters within the cluster flow representation;and for each of the nodal pairs, establishing execution flow between thefirst and second nodes in said each nodal pair and inserting, in theexecution flow so established, executable code for a selected one of aplurality of different pre-defined second operations so as tocollectively define the marked code, whereby the marked code containsthe input executable code and a plurality of different ones of thepre-defined second operations each of which has been randomly splicedinto control flow of the input executable code, wherein the identifiercollectively comprises the executable code, for all the different onesof the plurality of predefined second operations, and the associatedexecution flows associated therewith and involving the nodal pairs.